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Functional linear regression analysis aims to model regression relations which include a func- 
tional predictor. The analog of the regression parameter vector or matrix in conventional mul- 
tivariate or multiple-response linear regression models is a regression parameter function in one 
or two arguments. If, in addition, one has scalar predictors, as is often the case in applications 
to longitudinal studies, the question arises how to incorporate these into a functional regres- 
sion model. We study a varying-coefficient approach where the scalar covariates are modeled 
as additional arguments of the regression parameter function. This extension of the functional 
linear regression model is analogous to the extension of conventional linear regression models 
to varying-coefficient models and shares its advantages, such as increased flexibility; however, 
the details of this extension are more challenging in the functional case. Our methodology com- 
bines smoothing methods with regularization by truncation at a finite number of functional 
principal components. A practical version is developed and is shown to perform better than 
functional linear regression for longitudinal data. We investigate the asymptotic properties of 
varying-coefficient functional linear regression and establish consistency properties. 

Keywords: asymptotics; eigenfunctions; functional data analysis; local polynomial smoothing; 
longitudinal data; varying-coefficient models 



1. Introduction 

Functional linear regression analysis is an extension of ordinary regression to the case 
where predictors include random functions and responses are scalars or functions. This 
methodology has recently attracted increasing interest due to its inherent applicability 
in longitudinal data analysis and other areas of modern data analysis. For an excel- 
lent introduction, see Ramsay and Silverman (2005). Assuming that predictor process 
X possesses a square- integrable trajectory (i.e., X E L'^{S), where 5 C M), commonly 
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considered functional linear regression models include 

E{Y\X) ^iiY+ I m{X{s) ' ds, (1.1) 

Js 

with a scalar response F € M, and 

E{Y{t)\X)=iiY{t)+ I fi{s,t){X{s)-iix{s))ds, (1.2) 
Js 

with a functional response Y S L^{T) and T being a subset of the real line R, where 
Hx{s) = E{X{s)), seS and fiyit) = E{Y{t)), t e T (Ramsay and Dalzell (1991)). In 
analogy to the classical regression case, estimating equations for the regression function 
are based on minimizing the deviation 

r(s,t)= argmin eU (Y{t)-iiY{t)-( /3(s,t)[X(s) - /ijf (s)] ds) dt 

/3eL2(sxr) lJr\ Js J 

and analogously for (1.1). To provide a regularized estimator, one approach is to expand 
/3 ( • , • ) in terms of the eigenf unctions of the covariance functions of X and Y , and to use an 
appropriately chosen finite number of the resulting functional principal component (FPC) 
scores of X as predictors; see, for example, Silverman (1996), Ramsay and Silverman 
(2002, 2005), Besse and Ramsay (1986), Ramsay and Dalzell (1991), Rice and Silverman 
(1991), James et al. (2000), Cucvas et al. (2002), Cardot et al. (2003), Hall and Horowitz 
(2007), Cai and Hall (2006), Cardot (2007) and many others. 

Advances in modern technology enable us to collect massive amounts of data at fairly 
low cost. In such settings, one may observe scalar covariates, in addition to functional 
predictor and response trajectories. For example, when predicting a response such as 
blood pressure from functional data, one may wish to utilize functional covariates, such 
as body mass index, and also additional non-functional covariates Z , such as the age of 
a subject. It is often realistic to expect the regression relation to change as an additional 
covariate such as age varies. To broaden the applicability of functional linear regression 
models, we propose to generalize models (1.1) and (1.2) by allowing the slope function 
to depend on some additional scalar covariates Z . Previous work on varying-coefficient 
functional regression models, assuming the case of a scalar response and of continuously 
observed predictor processes, is due to Cardot and Sarda (2008) and recent investigations 
of the varying-coefficient approach include Fan et al. (2007) and Zhang et al. (2008). 

For ease of presentation, we consider the case of a one-dimensional covariate Z G Z cM., 
extending (1.1) and (1.2) to the varying-coefhcient functional linear regression models 

EiY\X, Z)^^lY\z+ I s)(^(s) - /ix|z(s)) ds (1.3) 
Js 

aJid 

E{Y[£)\X, Z) = tiY\z{i) + P{Z, s, t)iXis) ~ /ix|z(s)) ds (1.4) 
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for scalar and functfonal responses, respectively, with corresponding characterizations for 
the regression parameter functions 



Z = z 



f3*iz,s)= argmin EUY - ^lY\z - l3iZ,s)[X{s) - tix\z{s)]ds 

I3*{z,s,t)= argmin e\ f (vit) - fiY\z{t) 
/3(2,-,-)ei2(5xr) UrV 



PiZ,s,t)[X{s)-^lx\z{s)]ds] dt 



Here, fJ-xizi^) and fJ-Y\zit) denote the conditional mean fmiction of X and y, given Z. 

Intuitively, after observing a sample of n observations, {Xi,Yi, Zi}f^i, the estimation 
of the varying slope functions can be achieved using kernel methods, as follows: 



^*{z, s) = argmin Kb{Zi - z) 



1 2 



Yi-HY\z,- I l3iZi,s)[Xi{s) - nx\Ziis)]ds 



and 



P*{z,s,t) = argmin y^A"b(Zi - z) 



T 



Y,it) - ^iY\zM - / f3{Z„s,t)[X,{s)~fixizM]<^s 
s 



dt 



for (1.3) and (1.4), respectively, where Ki,{z) = K{z/h)/h for a kernel function A'(-) and 
a bandwidth 6 > 0. The necessary regularization of the slope function is conveniently 
achieved by truncating the Karhunen-Loeve expansion of the covariance function for the 
predictor process (and the response process, if applicable). To avoid difficult technical 
issues and enable straightforward and rapid implementation, it is expedient to adopt the 
two-step estimation scheme proposed and extensively studied by Fan and Zhang (2000). 

To this end, we first bin our observations according to the values taken by the additional 
covariatc Z into a partition of Z. For each bin, we obtain the sample covariance functions 
based on the observations within this bin. Assuming that the covariance functions of 
the predictor and response processes are continuous in z guarantees that these sample 
covariance functions converge to the corresponding true covariance functions evaluated 
at the bin centers as bin width goes to zero and sample size increases. This allows 
us to estimate the slope function at each bin center consistently, using the technique 
studied in Yao et al. (2005b), providing initial raw estimates. Next, local linear smoothing 
(Fan and Gijbels (1996)) is applied to improve estimation efficiency, providing our final 
estimator of the slope function for any z & Z. 

The remainder of the paper is organized as follows. In Section 2, we introduce ba- 
sic notation and present our estimation scheme. Asymptotic consistency properties are 
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reported in Section 3. Finite-sample implementation issues are discussed in Section 4, 
results of simulation studies in Section 5 and real data applications in Section 6, with 
conclusions in Section 7. Technical proofs and auxiliary results are given in the Appendix. 

2. Varying coefficient functional linear regression for 
sparse and irregular data 

To facilitate the presentation, we focus on the case of a functional response, which remains 
largely unexplored. The case with a scalar response can be handled similarly. Wc also 
emphasize the case of sparse and irregularly observed data with errors, due to its relevance 
in longitudinal studies. The motivation of the varying-coefficicnt functional regression 
models (1.3) and (1.4) is to borrow strength across subjects, while adequately reflecting 
the effects of the additional covariatc. We impose the following smoothness conditions: 

[AO] The conditional mean and covariance functions of the predictor and response 
processes depend on Z and are continuous in Z , that is, /ix,z(s) ~ E{X{s)\Z = z), 
My,.(t) = E{Y{t)\Z = z), Gx,.(si,S2) = cov(X(si), X(s2)W = z), GyAtiM) = 
coY{Y{ti),Y{t2)\Z = z) and CxY,z{s,t) = cov{X{s),Y{t)\Z = z) are continuous 
in z and their respective arguments, and have continuous second order partial 
derivatives with respect to z. 

Note that [AO] implies that the conditional mean and covariance functions of predictor 
and response processes do not change radically in a small neighborhood of Z = z. This 
facilitates the estimation of l3{z^s,t), using the two-step estimation scheme proposed by 
Fan and Zhang (2000). While, there, the additional covariate Z is assumed to take values 
on a grid, in our case, Z is more generally assumed to be continuously distributed. In 
this case, we assume that the additional variable Z has a compact domain Z and its 
density fz{z) is continuous and bounded away from both zero and infinity. 

[Al] Z is compact, fz{z) G C", /z = inf^ez fz{z) > and fz = sup^^^ fz{z) < oo. 

2.1. Representing predictor and response functions via 

functional principal components for sparse and irregular 
data 

Suppose that we have observations on n subjects. For each subject i, conditional on 
Zi = Zi, the square-integrable predictor trajectory Xi and response trajectory Yi are un- 
observable realizations of the smooth random processes {X,Y\Z = Zi), with unknown 
mean and covariance functions (condition [AO]). The arguments of X{-) and Y{-) are 
usually referred to as time. Without loss of generality, their domains S and T are 
assumed to be finite and closed intervals. Adopting the general framework of func- 
tional data analysis, we assume, for each z, that there exist orthogonal expansions 
of the covariance functions Gx,z{-,-) (resp. Gy.z{-, )) in the L2 sense via the eigen- 
functions ■0z,m (resp. 4'z,k), with non-incrGS-sing eigenvalues Pz.m 

(resp. \z,k), that is, 

Gx,z[si,S2) = Z]m=l/^2,'nV'2,m(si)V'z,m(s2), GY,z{tlM) = Z^fcll ^z,k4'z,k{tl)4>z,k{t2) ■ 
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Instead of observing the full predictor trajectory Xj and response trajectory Yi, typ- 
ical longitudinal data consist of noisy observations that are made at sparse and ir- 
regularly spaced locations or time points, providing sparse measurements of predic- 
tor and response trajectories that are contaminated with additional measurement er- 
rors (Staniswalis and Lee (1998), Rice and Wu (2001), Yao et al. (2005a, 2005b)). To 
adequately reflect the situation of sparse, irregular and possibly subject-specific time 
points underlying these measurements, we assume that a random number Li (resp. Ni) 
of measurements for Xi (rcsp. Yi) is made, at times denoted by Sn, Si2, ■ ■ ■ , SiLi 
(resp. Tii,Ti2, . ■ . ,TiN.). Independent of any other random variables, the numbers of 
points sampled from each trajectory correspond to random variables Li and Ni that are 
assumed to be i.i.d. as L and N (which may be correlated), respectively. For 1 <i <n, 
1 <l < Li, 1 < j < Ni, let Uii (resp. Vij) be the observation of the random trajectory Xi 
(resp. Yi) made at a random time Su (resp. Tij), contaminated with measurement errors 
Eii (resp. Cij). Here, the random measurement errors en and Cij are assumed to be i.i.d., 
with mean zero and variances and ay , respectively. They are independent of all other 
random variables. The following two assumptions arc made. 

[A2] For each subject i, Li L (resp. Ni ''^^ TV) for a positive discrete- valued ran- 
dom variable with EL < oo (resp. EN < oo) and P{L > 1) > (resp. P{N > 1) > 



[A3] For each subject i, observations on Xi (resp. Yi) are independent of Li (resp. Ni), 
that is, {{Sii,Uii: I € Ci)} is independent of Li for any £i C {1,2, ... ,Li} 
(resp. {{Tij, Vij): j S M} is independent of Ni for any A/i C {1, 2, . . . , Ni}). 

It it surprising that under these "longitudinal assumptions" , where the number of ob- 
servations per subject is fixed and does not increase with sample size, one can nevertheless 
obtain asymptotic consistency results for the regression relation. This phenomenon was 
observed in Yao et al. (2005b) and is due to the fact that, according to (2.3), the target 
regression function depends only on localized eigenfunctions, localized eigenvalues and 
cross-covariances of localized functional principal components. However, even though lo- 
calized, these eigenfunctions and moments can be estimated from pooled data and do 
not require the fitting of individual trajectories. Even for the case of fitted trajectories, 
conditional approaches have been implemented successfully, even allowing reasonable 
derivative estimates to be obtained from very sparse data (Liu and Miiller (2009)). 

Conditional on Zi ~ z, the FPC scores of Xi and Yi are Cz,im = J[^i{s) — 
^J■x,zis)]1pz,■m{s)ds and ^^.^k = /[^i(s) - /iy,z(s)]</>z,fe(s) ds, respectively. For all z, these 
FPC scores Cz,im satisfy ECz.im = 0, corr(Cz,imi,G,im2) = for any mi 7^ m2 and 
var(Cz^im) = Pz,m\ analogous results hold for £,z,ik- With this notation, using the 
Karhunen-Loeve expansion as in Yao et al. (2005b), conditional on Zi, the available 
measurements of the ith predictor and response trajectories can be represented as 



0). 



Uii — Xi{Sii) + Eii 



(2.1) 



00 




KKL, 



m— 1 
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(2.2) 

OO 

= ^J■Y,z,iTij) + ^S,z,,ik(f>Zi.k{Tij) + Ey, 1 < j < Ni. 

k=l 

2.2. Estimation of the slope function 

For estimation of the slope function, one standard approach is to expand it in terms 
of an orthonornial functional basis and to estimate the coefficients of this expansion to 
estimate the slope function in the non-varying model (1.2) (Yao et al. (2005b)). As a 
result of the non-increasing property of the eigenvalues of the covariancc functions, such 
expansions of the slope function arc often efficient and require only a few components 
for a good approximation. Truncation at a finite number of terms provides the necessary 
regularization. Departing from Yao et al. (2005b), we assume here that an additional 
covariate Z plays an important role and must be incorporated into the model, motivating 
(1.4). To make this model as flexible as possible, the conditional mean and covariance 
functions of the predictor and response processes are allowed to change smoothly with 
the value of the covariate Z (Assumption [AO]), which facilitates implementation and 
analysis of the two-step estimation scheme, as in Fan and Zhang (2000). 

Efficient implementation of the two-step estimation scheme begins by binning sub- 
jects according to the levels of the additional covariate Zi, i = l,2,...,n. For ease of 
presentation, we use bins of equal width, although, in practice, non-equidistant bins can 
occasionally be advantageous. Denoting the bin centers by z^P\p = 1,2, . . .,P, and the 
bin width by h, the pth bin is [z^p^ - z'p) -f |) with /i = where \Z\ denotes the 
size of the domain of Z, 2^^^ — h/2 = inf{z: z e Z} and z^^' -f- h/2 = sup{z: z S Z} (note 
that the last bin is [z^^' - /i/2,z(^' + h/2]). Let TV;,^ = {t: Z,e[z - !^,z + |)} be the 
index set of those subjects falling into bin [z — ^,z + and n^ji = ^Afz,h the number 
of those subjects. 

2.2.1. Raw estimates 

For each bin [z'^p^ — |,z*^^'^ -I- |), we use the Yao et al. (2005a) method to obtain our 
raw estimates /ixz(p)(') I^y zm{') of the conditional mean trajectories and the raw 
slope function estimate /3(z^P\s,t). The corresponding raw estimates of a\ and ay are 
denoted by ct^ ^(p) and d'y ^(p) for p = 1, 2, . . . , P. For each I < p < P, the local linear 
scatterplot smoother of fix z(p) (s) is defined through minimizing 

E E-iff^)(c^..-do-di(^.,-.s))^ 

with respect to do and di, and setting ft-x z'~p'>{^) to be the minimizer dp, where ki(-) is 
a kernel function and bx^z(p) is the smoothing bandwidth, the choice of which will be 
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discussed in Section 4. We define a similar local linear scatterplot smoother of fiY,z(p) (0- 
According to Lemma 2 in the Appendix, raw estimates fix zip) (■s) 3'iid fly zipi (0 con- 
sistent uniformly for z^p\ p ~ 1,2, . . . , P, for appropriate bandwidths bx zip) by z(p) ■ 
Extending Yao et al. (2005b), the conditional slope function can be represented as 

P{Z, S,t)^Y.Y. -^^^^,rnis)Mt) (2.3) 



k—1 m—1 



^z,m 



for each z, where ipz.mi') and (f>z.k{') a-rc the cigenfunctions of covariance functions 
Gx,z{', •) and Gy^zi', ■)■, respectively, and Cz.m and (,z,k are the functional principal com- 
ponent scores of X and Y , respectively, conditional on Z = z. 

To obtain raw slope estimates P{z^'''\s,t) for p~l,2,...,P, we first estimate 
the conditional covariance functions G^.zCp) (si, S2), Gy .,{p){ti,t2) and Cxy^z'~p){s,t) 
at each bin center, based on the observations falling into the bin, using the ap- 
proach of Yao et al. (2005b). From "raw" covariances, Gx i zwi^iiT^ik) = [Uij ~ 
P'X,z(p)iSlij)){Uik - jlx,z(p)iSik)) for l<j,k<Li, ieU^^^h and p = 1, 2, . . . , P, and 
the locally smoothed conditional covariance ^(p) (si, S2) is defined as the minimizer 
60 of the local linear problem 

E\ ^ / Sij — Si Sil — S2 
/ '^A~r 'T 

X [Gx,i,zw{SijiS,i) - bo- bii{Sij - si) - bi2{Sii - §2)]^, 

where K2{-,-) is a bivariate kernel function and ^x,z(p) a smoothing bandwidth. The 
diagonal "raw" covariances Gx^i^zip^i^ij^^ij) are removed from the objective function 
of the above minimization problem because EGx^i^z(p)i^ij^'^ii) ~ (^ov{X{Sij),X{Sii)) + 
5jia\, where 5ji = 1 \f j = I and otherwise. Analogous considerations apply for 
Gy^^{p){Tij,Tii). The diagonal "raw" covariances Gx^i^z(p)i^ijT ^ij) and Gy^^^^{p){Tij ,Tij) 
can be smoothed with bandwidths bx z(p) v and by^ipjy, respectively, to estimate 
^(p) (s) = Gjf^^cp) (s, s) + <Jx ^^'i ^Y,z<-p)i'f-) = Gy .,{p){t,t) + ay, respectively. The re- 
sulting estimators are denoted by Vx^2(p)(s) and VV.z(p)(i), respectively, and the differ- 
ences (Vx z(p> (s) ~ Gx z(p) (s, s)) (and analogously for Y) can be used to obtain estimates 
^x z<.p) ■'^^'^ '^x ^^'^ zip) by integration. Furthermore, "raw" conditional cross- 

covariances Ci^^(p){Sii,Tij) = {Uu - Ax,z(p) (5'ii))(^^y - iJ-y^zip^iTi])) are used to estimate 
CxY,zM{s,t), by minimizing 



E E E 



K2 



Sij s Tij t 



^l,z(p) ^2, zip) 

X [C,,(p, {Sa,T,,) -bo- biiiSu -s)- b^T,, - t)f 



with respect to bo, &11 and 612, and setting Gxy zip)i-^T't) to be the minimizer bo, with 
smoothing bandwidths hi ^(p) and /I2 zip) ■ 
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In (2.3), the slope function may be represented via the eigenvalues and eigenfunctions 
of the covariance operators. To obtain the estimates p^cp) ^ and V'zCp) mi') (resp. A^cp) 
and (/j^cp) j,(-)) of eigenvalue-eigenfunction pairs Pz(p).m ^^'^ V'z<j').m(') (resp. A^(p) and 
(t>z(p) fc('))i use conditional functional principal component analysis (CFPCA) for 
Gx z(p>{'t') (resp. Gy z(p){'t-)), by numerically solving the conditional eigenequations 

Pz(p),m^z(PKni{s2), m=l,2,..., (2.4) 

'^z<p),fc0z(p),fc(*2), fc=l,2,.... (2.5) 

Note that we estimate the conditional mean functions and conditional covariance func- 
tions over dense grids of S and T- Numerical integrations like the one on the left-hand 
side of (2.4) are done over these dense grids using the trapezoid rule. Note, further, that 
integrals over individual trajectories are not needed for the regression focus, in that we 
use conditional expectation to estimate principal component scores, as in (4.1). 
Due to the fact that 

oo oo 

,m{s)(j)z,k{t), 

k=l m=l 

we then obtain preliminary estimates of Uz.mk = E{(^z,m£,z.k) at the bin centers z^p\ 
p = 1, 2, . . . , P, by numerical integration, 

^z(p} .mk = J^J^ ^zip) ^m{s)CxY,z(p) {s, t)<t>zip) ds dt. (2.6) 

With (2.3), (2.4), (2.5) and (2.6), the raw estimates of P{z'-P\s,t) are 

K M ~ 

^(z(P),S,t)^Y.T.^^^^^^^^Krn{s)4>ziPKk{t). (2.7) 
fc=lm=l ^^'P^™ 

Further details on the "global" case can be found in Yao et al. (2005b). 
2.2.2. Refining the raw estimates 

We establish in the Appendix that the raw estimates fix zip) (s), fi-Y z(p) i^) and P{z^p\ s, t) 
are consistent. As has been demonstrated in Fan and Zhang (2000), there are several 
reasons to refine such raw estimates. For example, the raw estimates are generally not 
smooth and are based on local observations, hence inefficient. Most importantly, appli- 
cations require that the function f3{z,s,t) is available for any z Cz Z. 

To refine the raw estimates, the classical approach is smoothing, for which we adopt 
the local polynomial smoother. Defining Cp = (1, z^^^ — z, . . . , {z^p^ — zY)'^ , p = 1, 2, . . . , P, 



Gx,z(p) {si,S2)ljjz(p),misi)<isi 

Gy.zM (^ij^2)0z(p),fc(ii) dii 



r 



738 



Y. Wu, J. Fan and H.-G. Miiller 



the local polynomial smoothing weights for estimating the gth derivative of an underlying 
function are 

z, h) = g!e^+i_,,+i(C^WC)-icpiffc(z(P) - z), p = 1, 2, . . . ,P, 

where C (ci, C2, . . . , cp)^, W = diag(A"fc(z(i) - z), Xb(z(2) - z), . . . , ii't(z(^) - z)) and 
Gq+i.r+i = (0, . . . , 0, 1, 0, . . . , 0)^ is a unit vector of length r + 1 with the l)th element 
being 1 (see Fan and Gijbcls (1996)). Our final estimators arc given by 



p 






p=l 






P 












p=l 






P 










,s,t) 


p=l 







Due to the assumption that the variance of the measurement error does not depend 
on the additional covariate, the final estimators of a\ and ay can be taken as simple 
averages, 

p p 

^^==E4,.(.)/^ and 4=E4.(p)/^- (2-8) 
p— 1 p=i 



Remark 1. The localization to Z ~ z, as needed for the proposed varying coefficient 
model, coupled with the extreme sparseness assumption [A2], which adequately reflects 
longitudinal designs, is not conducive to obtaining explicit results in terms of convergence 
rates for the general case. However, by suitably modifying our arguments and coupling 
them with the rates of convergence provided on page 2891 of Yao et al. (2005b), we can 
obtain rates if desired. These are the rates given there, which depend on complex intrinsic 
properties of the underlying processes, provided that the sample size n is everywhere 
replaced by nh, the sample size for each bin. 



Remark 2. In this work, we focus on sparse and irregularly observed longitudinal data. 
For the case where entire processes are observed without noise and are error-free, one 
can estimate the localized eigenfunctions at rates of L^-convergence of (n/i)"^/^ (see 
Hall et al. (2006)), where h is the smoothing bandwidth. For the moments of the func- 
tional principal components, a smoothing step is not needed. Known results will be 
adjusted by replacing n with nh when conditioning on a fixed covariate level Z ^ z; see 
Cai and Hall (2006) and Hall and Horowitz (2007). 



Varying- coejficient functional linear regression 

3. Asymptotic properties 



739 



We establish some key asymptotic consistency properties for the proposed estimators. 
Detailed technical conditions and proofs can be found in the Appendix. 

The observed data set is denoted by {Zi, {,Sii,Uii)^^-^,{Tij,Vij)^^^: i = 1, 2, . . . , n}. We 
assume that it comes from (1.2) and satisfies [AO], [Al], [A2] and [A3]. 

For h oc y/n, define the event 

i?„ = {minn^(p) ,j > n}, (3.1) 

where n^{p) j^ is the number of observations in the pth bin and n cx ^/n means that there 
exist Co and Co such that < co < \/n < Co < oo. It is shown in Proposition 1 in the 
Appendix that P{En) — >■ 1 as n — > oo for P oc n^/^, as specified by condition (xi). 

The global consistency of the final mean and slope function estimates follows from the 
following theorem. 

Theorem 1 (Consistency of time- varying functional regression). Under condi- 
tions [AO], [Al], [A2] and [A3] in Section 2 and conditions [A4], [A5] and (i)-(xi) in the 
Appendix, on the event £"„ with P{En) — >■ 1 as n-^ oo, we have 

ifiwAr)- l^w,z{r)f '^rdz^Q forW ^X,n^S and W Y,n = T , 

izJn 



{j3{z, s, t) - I3{z, s, t)f ds dt dz 4 0. 

Z JT JS 

To study prediction through time-varying functional regression, consider a new pre- 
dictor process X* with associated covariate Z* . The corresponding conditional expected 
response process Y* and its prediction Y* are given by 



(3.2) 



Y*{t)^E{Y{t)\X*,Z*) 

= t^Y,Z' (t) + P{Z\s, t){X* (s) - ^Jix,Z' is)) ds, 

Y*{t)^{iY.Z'{t)+ [ P{Z*,s,t){X*{s)-fLx,z'{s))ds. (3.3) 



Theorem 2 (Consistency of prediction). For a new predictor process X* with 
associated covariate Z* , it holds under the conditions of Theorem 1 that J^iY*(t) — 

Y*{t))^dt^O, where Y*{t) andY*{t) are given by (3.2) and (3.3). 
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4. Finite-sample implementation 

For the finite-sample case, several smoothing parameters need to be chosen. Following 
Yao et al. (2005a), the leave-one-curve-out cross-validation method can be used to se- 
lect smoothing parameters 6x.z(p) , &y,z(p), &x,z(p),yj ^Y,zip),v-: hx^zMi hy^zM, /Ji,z(p) and 
/12 2(p) , individually for each bin. Further required choices concern the bin width h, the 
smoothing bandwidth b and the numbers M and K of included expansion terms in (2.7). 
The method of cross-validation could also be used for these additional choices, but this 
incurs a heavy computational load. A fast alternative is a pseudo-Akaike information 
criterion (AIC) (or pseudo-Bayesian information criterion (BIC)). 

[1] Choose the number of terms in the truncated double summation representation 
(3{z''P\s,t) for M{n) and K{n), using AIC or BIC, as in Yao et al. (2005b). 

[2] For each bin width h, choose the best smoothing bandwidth b*{h) by minimizing 
AIC or BIC. 

[3] Choose the bin width h* which minimizes AIC or BIC, while, for each h investi- 
gated, we use b*{h) for b. 

For [1], we will choose M and K simultaneously for all bins, minimizing the conditional 
penalized pseudo-deviance given by 

^(^) = E E \^2^~^r^^ + N^ log(27T) + loga^^(„ \+P, 
p=iieAr^^'^Y,z(p) ) 

where V = 2PK for AIC and V = (\ogn)PK for BIC, with respect to K. Here, for ieMp, 

= Al>(p),i~Z]fcLl C(p),fe,i^z(p),fe,j ■with fj-Y.z'-P^.i = (Ay,z(p) {Til),. ■ ■,'i^Y,ziP) {TiNi)V , 

Vi = (Vii,...,K:AfJ^, 4>zi.pKkA = i4>z(p)ATii),---,4>z(p),kiTiN,)V and with estimated 
principal components 

&pi,k,i = ^z(p\k4>zi.p> ,k,i^Y,z(p\i{y''- ~ Ay,z(p),i)i (4-1) 

where T,Y,z(p),i is an Ni-hy-Ni matrix whose (j, fc)-element is Gy^zM (Ty j'^ifc) 2(p)<^jfc- 
Analogous criteria are used for the predictor process X, selecting K by minimizing 
AIC(A') and BIC(Ar). Marginal versions of these criteria are also available. 

In step [2], for each bin width h, we first select the best smoothing bandwidth b*{h) 
based on AIC or BIC and then select the final bin width h* by a second application of 
AIC or BIC, plugging b*{h) into this selection as follows. For a given bin width h, define 
the P-by-P smoothing matrix S0.2 whose (pi,p2)th element is ujq ,2(z(Pi),z(P=),5). The 
effective number of parameters of the smoothing matrix is then the trace of 8(^2^0,2 (cf. 
Wahba (1990)). This suggests minimization of 

AIC(6|/j) = El^ef + 7Vaog(27T) + iV, logaf-l + 2tr(S^2So,2), 

i=l '^Y ) 
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leading to b*{h), where 



with fly i = {f''Y,zi{Tii), . ■ . , fiy.ziiTiN-))'^ and estimated principal component scores 

C(p),m,i ~ Pz(p\k'^ zM ,m,i^x,z(p') ,1^^'- ^ f^X^Zid)- 

The definition of pseudo-BIC scores is analogous. 
In step [3] , to select the bin width h* , wc minimize 

AIC(/i, b*ih)) = ^1 -Lef + N, log(27t) + N, log <7^ 1 + 2MKP, 
or the analogous BIC score, using b*{h) for each h, as determined in the previous step. 

5. Simulation study 

We compare global functional linear regression and varying-coefficient functional linear 
regression through simulated examples with a functional response. For the case of a 
scalar response, the proposed varying-coefficient functional linear regression approach 
achieves similar performance improvements (results not reported). For the finite-sample 
case, there are several parameters to be selected (see Section 4). In the simulations, we 
use pseudo-AIC to select bin width h and pseudo-BIC to select the smoothing bandwidth 
b and the number of regularization terms Af (n) and Kin). 

The domains of predictor and response trajectories are chosen as 5 = [0, 10] 
and T= [0,10], respectively. The predictor trajectories X are generated as X{s) = 
P-x{s) + Y^m=i Cmipmis) for s G S, with mean predictor trajectory iJ.x{s) = (s + sin(s)), 
the three eigenfunctions are il^i{s) = — y^cos(7Ts/5), V'2(s) = y^sin(7Ts/5), V'3(s) = 

icos(27Ts/5) and their corresponding functional principal components are indepen- 
dently distributed as Ci ~ ^^(0,2^), C2 ^ N{0,Vf), C3 ^ iV(0, 1^). The additional co- 
variate Z is uniformly distributed over [0,1]. For z g [0,1], the slope function is linear 
in z, f3{z,s,t) = (z + l){tpi{s)ipi{t) + i'2{s)ip2{t) + i'3{s)ip3{t)) and the conditional re- 
sponse trajectory is E(Y{t)\X,Z = z) = iiY,z{t) + /3(z, s, i)(X(s) — /ix(s)) ds, where 
fJ-Y,z{t) = (1 -|- z){t + sin(t)). We consider the following two cases. 

Example 1 (Regular case). The first example focuses on the regular case with dense 
measurement design. Observations on the predictor and response trajectories are made 
at Sj = [j — l)/3 for j — 1,2, ...,31 and tj ~ {j — l)/3 for j = 1,2, ...,31, respectively. 
Wc assume the measurement errors on both the predictor and response trajectories are 
distributed as A^(0, 1^), that is, (j\ = l and ay = 1- 
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Table 1. Simulation results: mean and standard deviation of MISPE for 
global and varying-coefBcient functional linear regression with a functional 
response, for both regular and sparse cases 





Functional linear 


Varying-coefBcient functional 




regression 


linear regression 


Regular 


4.0146 (1.6115) 


0.7836 (0.4734) 


Sparse and irregular 


4.0013 (0.8482) 


1.0637 (0.3211) 



Example 2 (Sparse and irregular case). In this example, we make a random number 
of measurements on each trajectory in the training data set, chosen with equal probabil- 
ity from {2,3,..., 10}. We note that, for the same subject, the number of measurements 
on the predictor and the number of measurements on the response trajectory are inde- 
pendent. For any trajectory, given the number of measurements, the measurement times 
are uniformly distributed over the corresponding trajectory domain. The measurement 
error is distributed as A^(0, 1^) for both the predictor and the response trajectories. 



In both examples, the training sample size is 400. An independent test set of size 1000 
is generated with the predictor and response trajectories fully observed. We compare 
performance using mean integrated squared prediction error (MISPE) 



^ 1000 



1000 — jr 



E{Y*{t)\x;,z;) 

i^Y^z; it) + £ /3(Z;, 5, tKX*{s) - (s)) 



ds 



dt/\n 



analogously for the global functional linear regression, where {X* ,Y* , Z*) denotes the 
data of the jth subject in the independent test set. In Table 1, we report the mean and 
standard deviation (in parentheses) of the MISPE of the global and varying-coefficient 
functional linear regression over 100 repetitions for each case. This shows that in this 
simulation setting, the proposed varying-coefHcient functional linear regression approach 
reduces MISPE drastically, compared with the global functional linear regression, both 
for regular and sparse irregular designs. 

To visualize the differences between predicted conditional expected response trajecto- 
ries, for a small random sample, in both the regular and sparse and irregular design cases, 
we randomly choose four subjects from the test set with median values of the integrated 
squared prediction error (ISPE) for the varying-coefficient functional linear regression. 
The true and predicted conditional expected response trajectories are plotted in Figure 
1, where the left four panels correspond to the regular design case and the right four to 
the sparse irregular case. Clearly, the locally varying method is seen to be superior. 
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6. Applications 

We illustrate the comparison of the proposed varying-coefBcient functional linear model 
with the global functional linear regression in two applications. 

6.1. Egg- laying data 

The egg-laying data represent the entire reproductive history of one thousand Mediter- 
ranean fruit flies ('medflies' for short), where daily fecundity, quantified by the number 
of eggs laid per day, was recorded for each fly during its hfetime; see Carey et al. (1998) 
for details of this data set and experimental background. 

We are interested in predicting future egg-laying patterns over an interval of fixed 
length, but with potentially different starting time, based on the daily fecundity infor- 
mation during a fixed earlier period. The predictor trajectories were chosen as daily 
fecundity between day 8 and day 17. This interval covers the tail of an initial rapid rise 
to peak egg-laying and the initial part of the subsequent decline and, generally, the egg- 
laying behavior at and near peak egg-laying is included. It is of interest to study in what 
form the intensity of peak cgg-laying is associated with subsequent egg-laying behavior, 
as trade-offs may point to constraints that may play a role in the evolution of longevity. 

While the predictor process is chosen with a fixed domain, the response process has 
a moving domain, with a fixed length of ten days, but a different starting age for 
each subject, which serves as the additional covariate Z. Due to the limited num- 
ber of subjects in this study, we use a pre-specificd discrete set for the values of Z: 



z=0.0309 z=0.3037 z=00550 2=02993 




02468 10 02468 10 02468 10 02468 10 



Figure 1. In one random repetition, the true (solid) conditional expected response trajectories 
and predicted conditional expected response trajectories via the global functional linear regres- 
sion (dot-dashed) and the varying-coefficient functional linear regression (dashed) are plotted 
for four randomly selected subjects in the independent test set with median integrated squared 
prediction error. The left four panels and the right four correspond to the regular and sparse 
irregular cases, respectively. 
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z=17 z=19 z=21 




Figure 2. The slope functions estimated by tlie varying-coefRcient functional linear regression 
at different levels of the additional covariate z for the egg-laying data. 



2^ = {17,19,21,23,25,27,29,31,33} with a pre-specified bin width /i = 2. For subject i 
with Zi € Z, measurements Uij on the predictor trajectory are the daily numbers of eggs 
on day j + 7, and measurements Vik on the response trajectory correspond to the daily 
number of eggs on day k + zt for j = 1, 2, . . . , 10 and k = 1, 2, . . . , 10. The numbers of 
subjects in these bins are 30, 29, 18, 29, 22, 19, 19, 17 and 36, respectively. For each bin, 
we randomly select 15 subjects as the training set and the remaining subjects are used 
to evaluate the prediction performance, comparing the performance of the global and the 
varying-cocfficicnt functional linear regression. The prediction performance is quantified 
by mean squared prediction error (MSPE), defined for each subject i in the test set as 

^10 1 

MSPE,(*) = -Y^iyft. - V,k? and MSPE^i) = - ^(^L - V,k)\ 

k=l k=l 
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17 19 21 23 25 27 29 31 33 



Figure 3. The left panel plots the slope function estimated by the global functional linear 
regression for the egg-laying data and the right panel corresponds to box plots of the ratios of 
MSPE of the varying-coefRcient functional linear regression to that of the global functional linear 
regression for the subjects in the test data set for different levels of the additional covariate Z. 

where yf^. and y'^, denote the predicted daily fecundities corresponding to Vik using the 
global (resp. the proposed varying-coefEcient (local)) functional linear regression. 

Through pseudo-AIC, the global functional linear regression selects two and three 
principal components for the predictor and response trajectories, respectively, while the 
varying-coefBcient functional linear regression uses two principal components for both 
trajectories. After smoothing, the slope functions estimated by the varying-coefRcient 
models are plotted in Figure 2 for different values of Z and the estimated slope function 
for the global functional linear regression is plotted in the left panel of Figure 3. Box 
plots of the ratio MSPE/(z)/MSPEg(z) for subjects in the test data set are shown in the 
right panel of Figure 3 for different levels of the covariate Z. There is one outlier above 
the maximum value for Z = 18 which is not shown. For most bins, the median ratios 
are seen to be smaller than 1, indicating an improvement of our new varying-coefficient 
functional linear regression. Denoting the average MSPE (over the independent test data 
set) of the global and the varying-coefficient functional linear regression by MSPEg and 
MSPE;, respectively, the relative performance gain (MSPE; - MSPEg) /MSPEg is found 
to be —0.0810 so that the prediction improvement of the varying-coefficient method is 
8.1%. 

Besides prediction, it is of interest to study the dependency of the future egg-laying 
behavior on peak egg-laying. From the changing slope functions in Figure 2, we find that, 
for the segments close to the peak segments, the egg-laying pattern is inverting the peak 
pattern, meaning that sharper and higher peaks are associated with sharp downturns, 
pointing to a near-future exhaustion effect of peak egg-laying. In contrast, the shape 
of egg-laying segments further into the future is predicted by the behavior of the first 
derivative over the predictor segment so that slow declines near the end of peak cgg- 
laying are harbingers of future robust egg-laying. This is in accordance with a model of 
exponential decline in egg-laying that has been proposed by Miiller et al. (2001). 
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Figure 4. Plots of predictor processes: tlie left panel for the global functional linear regression 
and the right panel for different bins according to the additional covariate in the varying-coefR- 
cient functional linear regression. 

6.2. BLSA data with scalar response 

As a second example, we use a subset of data from the Baltimore Longitudinal Study 
of Aging (BLSA), a major longitudinal data set for human aging (Shoek et al. (1984), 
Pearson et al. (1997)). The data eonsist of 1590 male volunteers who were scheduled to 
be seen twice per year. However, many participants missed scheduled visits or were seen 
at other than scheduled times so that the data are sparse and irregular with unequal 
numbers of measurements and different measurement times for each subject. For each 
subject, current age and systolic blood pressure (SBP) were recorded during each visit. 
We quantify how the SBP trajectories of a subject available in a middle age range between 
age 48 and age 53 affect the average of the SBP measurements made during the last 
five years included in this study, at an older age. The predictor domain is therefore of 
length five years and the response is scalar. The additional covariate for each subject 
is the beginning age of the last five-year interval included in the study. After excluding 
subjects with less than two measurements in the predictor, 214 subjects were included 
for whom the additional covariate ranged between 55 and 75. We bin the data according 
to the additional covariate, with bin centers at ages 56.0, 59.0, 62.0, 65.0, 68.5 and 73.0 
years and the numbers of subjects in each of these bins are 38, 33, 38, 32, 39 and 34. 

We randomly selected 25 subjects from each bin for model estimation and used the 
remaining subjects to evaluate the prediction performance. In contrast to the egg-laying 
data, the predictor measurements in this longitudinal study are sparse and irregular. 
Pseudo-BIC selects two principal components for the predictor trajectories for both global 
and varying-coefficient functional linear regressions. Using the same criterion for relative 
performance gain as in the previous example, the varying-coefficient functional linear re- 
gression achieves 11.8% improvement compared to the global functional linear regression. 
Estimated slope functions are shown in Figure 5 and predictor trajectories in Figure 4. 

The shape changes of the slope functions with changing covariate indicate that the 
negative derivative of SBP during the middle-age period is associated with near-future 
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z=560 z=59.0 z=62.0 




0.5 1 1.5 2 2 5 3 3.5 4 4.5 5 2 4 2 4 2 4 



Figure 5. The estimated slope function via the global functional linear regression and the new 
proposed varying-coefRcient functional linear regression (for different levels of Z) are plotted as 
the solid lines in the left and right panels, respectively. 

SBP. Further into the future, this pattern is reversed and an SBP increase near the right 
end of the initial period is becoming predictive. 

7. Concluding remarks 

Our results indicate that established functional linear regression models can be improved 
when an available covariate is incorporated. We implement this idea by extending the 
functional linear model to a varying-coefiicient version, inspired by the analogous, highly 
successful extension of classical regression models. In both application examples, the 
increased flexibility that is inherent in this extension] leads to clear gains in prediction 
error. In addition, it is often of interest to ascertain the effect of the additional covariate. 
This can be done by plotting the regression slopes for each bin defined by the covariate 
and observing the dependency of this function or surface on the value of the covariate. 

Further extensions that are of interest in many applications concern the case of multi- 
variate covariates. If the dimension is low, the smoothing methods and binning methods 
that we propose here can be extended to this case. For higher-dimensional covariates or 
covariates that are not continuous, one could form a single index to summarize the covari- 
ates and thus create a new one-dimensional covariate which then enters the functional 
regression model in the same way as the one-dimensional covariate that we consider. 

As seen in the data applications, the major applications of the proposed methodology 
are expected to come from longitudinal studies with sparse and irregular measurements, 
where the presence of additional non-functional covariates is common. 

Appendix: Auxiliary results and proofs 

We note that further details, such as omitted proofs, can be found in a technical report 
that is available at http://www4.stat.ncsu.edu/~wu/WuFcLnMueller.pdf. 
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A bivariate kernel function K2(-,-) is said to be of order {v^t) with v = (1^1,1^2) if it 
satisfies 

r rO, 0<il+£2<iJl^l^l,e2^l^2, 

/ U^^V^^ K2iu,v) dudv = < v\, li—Vi^i2~V2^ (A.l) 

and 

J \u^^v^^K2{u,v)\dudv <oo for any £1 + £2 = ^, (A. 2) 

where vl — vi\ ■ 1/2^.. Similarly, a univariate kernel function ki(-) is of order {v,£) for a 
univariate i/ = i^i when (A.l) and (A. 2) hold for £2 = on the right-hand side while 
integrating over the univariate argument u on the left. 
We introduce the following technical conditions: 

(i) The variable S has compact domain S. Given Z = z, S has conditional density 
fs,z{s)- Assume, uniformly in z & Z, that -^,fs.z{s) exists and is continuous for 
£ — 2 on S and, further, inf^g^ /5^2(s) > 0, analogously for T. 

(ii) Denote the conditional density functions of (3,11) and {T,V) by gx,z{s,u) and 
gY,z{t,v), respectively. Assume that the derivative -^gx^z{s,u) exists for all 
arguments (s,u), is uniformly continuous on 5 x M and is Lipschitz continuous 
in z, for t = 2, analogously for gY,z(t,v). 

(iii) Denote the conditional density functions of quadruples (^i, ^2, t^i, C/2) and 
(Ti,r2, Vi, V2) by g2X,z{si,S2,ui,U2) and .92^,2(^1, ^2,wi,W2), respectively. For 
simplicity, the corresponding marginal conditional densities of (51, 5*2) and 
{Ti,T2) arc also denoted by 52Jf.z(si, S2) and (72y,z(^i, ^2), respectively. Denote 
the conditional density of {S,T,U,V) given Z = z by gxY,z{s,t,u,v) and, sim- 
ilarly, its corresponding conditional marginal density of {S,T) by gxY,z{s,t). 

Assume that the derivatives — ^ — Frfl2X z(si, S2, UI7 W2) exist for all arguments 

(si, S2, wi, "1*2), are uniformly continuous on x and are Lipschitz continu- 
ous in z for ^1 -I- ^2 = ^, < £1, £2 < ^ = 2, analogously for g2Y,z{ti,t2,vi,V2) and 
gxY,z{s,t,u,v). 

(iv) For every p= 1,2,...,P, 0, n^w f^b*^ ,^^^^ -^00, n^(p) f^b^^ .^^,,^ < 00, 

bY,z(p) ^ "z(p),/i^y,z(p> ~^ °° ^z(p),hbY,z(p) < 00 as 71 -> 00. 

(v) For every p= 1,2,...,P, hx^^(p) 0, ^00, < 00, 
^y,z(p) ^ 0' "-2(p),/i^y^2(p) 00 and n^M i^h^ ,^^^^ < cxj as n 00. 

(vi) For every p = 1,2, . . .,P, /ii,^(p)/ft.2,z(p) ^ 1; ^i,z(p) ^ 0, n^(p) ^- 00 and 
'^zM,hh\ ,,(j,) < 00 as n ^ 00. 

(vii) For every p= 1,2,.. .,P, 6x,z(p),y^0, n^(p)_,j5^^^(p,_^ ^ 00, '^■z(p),/i^x,z(p),y < 
00, V,z(p),v 0, ?T-z(P),h^y^^(p) ^ 00 and nz(p),/i^y^z(p),v < 00 as n-^ c». 

(viii) Univariate kernel ki and bivariate kernel K2 are compactly supported, absolutely 
integrable and of orders (i^, = (0,2) and ((0,0), 2), respectively. 
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(ix) Assume that sup^^^-j^^x^ -^(^(^('^) ^ fJ-x,z{s)y\Z = z)) < oo, and analogously 
for Y. 

(x) The slope function (3{z,s,t) is twice differentiable in z, that is, for any (s,i) S 
S xT, -^f3[z,s^t) exists and is continuous in z. 

(xi) The bin width h and smoothing bandwidth h are such that b/h < oo as n — >■ oo. 
The bin width h is chosen such that P oc n^/®. 

Proposition 1. For En defined in (3.1), under (xi), it holds that P{En) ^1 as n — ^ oo. 

Proof. First, note that P(minn^(p) h> n)>l — X]^=i ^("-z(p) /!.<"■)■ Consider the pth 
bin and let tt^ = P{Z <E [z*-^' — — ■!)). Then n^{p) ^ is asymptotically distributed 

as N{mTp,n'Kp{l — TTp)) due to the normal approximation to a binomial random vari- 
able. Thus, P{n^(p) f^ > n) — )- /Ar(o,i)(ap)/ip with Op = — (n— niTp) / ^Jn-Kpil — TTp), where 
/Af(o,i)(') is the probability density function of the standard normal distribution. Due to 
[Al],' ^.p is bounded between fz/{fz + {P - l)fz) and fz/{{P - l)fz + fz)- It follows 
that P{En) — >■ 1 as n — oo by noting that h oc ^/n, P oc ri"'^/'^, and f n (o / decays 
exponentially in x. □ 

We next prove the consistency of the raw estimate of the mean functions of predictor 
and response trajectories within each bin. Consider a generic bin [z — h/2, z + h/2), with 
bin center z and bandwidth h, and let bx,z and by^z be smoothing bandwidths used to 
estimate fix.zis) and ^Y,zit), hx.z and hY,z for Gx.zisi, S2) and Gr.zlii, ^2), respectively, 
hi^z and h2^z for CxY,zis,t), and bx,z,v and fey^z.y for Vx,z{s) = Gx,z{s, s) + aj^ and 
^y,z(i) = GY^z{t,t) + ay, respectively. 

For a positive integer ^ > 1, let {ipp(t, v),p = 1,2, . . . ,1} he a. collection of real functions 
•(/ip : — ?► M satisfying the following conditions: 

[CI. la] The derivative functions ■^^p{t,v) exist for all arguments {t,v) and are uni- 
formly continuous on T x R. 
[C1.2a] / Jiljl{t,v)gY^z{t,v)dvdt<oo. 

[C2.1a] Uniformly in z G Z, bandwidths by.z for one-dimensional smoothers are such 
that by^z 0, Uz^hb'^^ — >■ 00 and Uz^hby'^'^ < 00 as n — )■ 00. 

Define iJLpi,,z = lJ-pii,At) ^ IF I '4'p{t,v)9Y.z{t,v) dv and 

1 1 ^' 

^^,hby+ ^^^ ^ EN V "y,. 

where gy^zit, v) is the conditional density of (T, V), given Z = z. 

Lemma 1. Under conditions [A0]~[A3] (i), (ii), (viii), [CI. la], [CI. 2a] and [C2.1a], we 
have Tpn = sup(3_j)g2xr l*p«,z W - lJ-p4,,z{t)\/ {h + {y^n^b'^^^Y^) = Op(l). 



T —f 
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Proof. Note that \^pn,z{t) " < \^pn,z{t) - E-^pnAi)\ + \E-^pnAi) ^ ^^pi>,z{i)\ 

and -E|Tp„| =0(1) implies that Tp„ = Op(l). Standard conditioning techniques lead to 



"y,z V V 



Tn-t 



JY,z 



h h 



For Zi = Zi € [z — h/2, z + h/2), perform a Taylor expansion of order £ on the integrand: 



E 



i'p{Tii,Vii)t<.i 



bY,z 



i^p{ti,vi)gY,z, 



ti-t 



-g^{^p{t,vi)gY,z, {t,vi)) 



; — «i 



{'4>p{t,vi)gY,zi{t,vi)) 



dti dvi 



'JY,z 



where t* is between t and h. Hence, \E[^p{Ta,Vii)Ki{^^)] — ^Jip4:,zA^)b'p'^\ < co-jf- x 
J \u^Ki{u)\du due to [CI. 2a] and the assumption that the kernel function ki(-) is of 
type {v, £), where cq is bounded according to [CI. la], cq < sup^^. t)^zxT 1^ / wi) x 
gY,zi{t, vi) dvi \ < oo. Furthermore, using (ii), we may bound 

SWp\E^pn.z{t) - Hpi,,zit)\ 



(A.3) 



h h 

z < < - 

2 - 2 



<cob'y-:/i£l) J \u'K,iu)\du 

+ eIe aup\^ip^^Zi{t) - fJ.p4,,z{t)\ 
I Iter 

<co( [ \u'Ki{u)\du]b'y-^^ /{£l) + Cih, 



where the constants do not depend on z. To bound Esupf^-j-\'^pn.zit) — E^!pn^zit)\, we 
denote the Fourier transform of ki(-) by Ci(*) = / e~'"*Ki(M) dw, and letting (ppn.z{u) = 
itl T.meu. , -m c™^"^ ^Ap(T„j, r^j), we have 



1 y —y 



T ■ — f 

bY,z 



,h i=i 
fp7i,z (u)e"'*"Ci (w6y>) du 
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and SUPjg^- \'^pn.z{t) - E'^pn,z{t)\ < 2„b^ / \Vpn,ziu) - Eippn^^{u)\ ■ |Ci (m&f,^) | du. 

Decomposing (ppn,z{') into real and imaginary parts, 

1 1 

'Ppn,z,R{u) = — V — Vcos(ur„„)?/>p(T„y,r„y), 

1 1 

^pn,z,l{u) = ^ — ^ sin (ur„y ) V-p (T;„j- , ^mj ) , 

we obtain £^|(/5pn,z(u) - = i^|</5pn,2,fl:(u) - i^<(fp„,z,_R(w)| + 

^ |ypn,z,j(") - Eippn^:^j{u)\. Note the inequality E\ippn,z,R{u) - Eippn,z,Ri'^)\ < 
\/E|<<3pn,z,fl(M) - Eippn^zM'^W ^nd tlic fact that {[Z;, A^^, (T^j , y^- )^ J: i G 7\4,/i} are 
i.i.d. implies that 



var((pp„,,,fl(M)) < — i;{i;(V'p(r™i,y„a)|z - h/2 < < z + /i/2)}, 



where m G N^^h, analogously for the imaginary part. As a result, we have 

£;sup|^'p„.^(t) -£;*p„,^(i)| 

ter 



< 



2y^S{i;(^2(j;^^^ _ <Z„,<z + h/2)] J \Ci{u)\du 



Note that E(ip'^(Tmi,Y„ii)) as a function of Z„j is continuous over the compact domain 



Z and is consequently bounded. Let C2 = 2sup2^g2: y -E'(V'p(7rni, ^mi)) < oo. Hence, we 
have 

E sup I vl/p„,, (t) - E^pr,,z {t)\< ^^/l^i^^)!'^" (^/^&^-+; y\ ( A.4) 

- — Z7T 



ter 



where the constant C2(/ |Ci(^)l du)/(27t) docs not depend on z. 

The result follows as condition [Al] implies that n^ji goes to infinity uniformly for 
z € Z as n — > oo and nz.hbyt'^ < oo implies that b'!^^^ = 0{l / {y/n^b'^J')) . We next 
extend Theorem 1 in Yao et al. (2005a) under some additional conditions. □ 

[C3] Uniformly in z e Z, bx,z ^ 0, n^^hbx z oo, n^.hbx z<^^ ^Y,z 0, T^z^hby ^ y 
GO and riz^hby z < oo as n — > oo. 

Lemma 2. Under conditions [A0]-[A3], (i), (ii), (viii), (ix) and [C3], we have 

\ij'X,z{s) ~ l^xAs)\ tT\ ^ 
sup - — — -j- = Op(l) and 

(z,s)ezxs "+ Wn-z.hOx.z) 
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(A.5) 

sup ■ , ^ T3T = *-'p(l)- 

Proof. The proof is similar to the proof of Theorem 1 in Yao et al. (2005a). □ 

Our next two lemmas concern the consistency for estimating the covariance functions, 
based on the observations in the generic bin [z — h/2,z + h/2). Let {6p(ri^r2,vi,V2),p = 
1, 2, . . . , be a collection of real functions 6'p : — )• M with the following properties: 

[CI. lb] the derivatives — -S — ?r0T,(7'i,r2,wi,W2) exist for all arguments (r'i,r2,i'i,U2) 

and are uniformly continuous on 72.i x TZ^ x for + ^2 = ^, < ^i,i'2 < ^, 
£ = 0,1,2; 

[CI. 2b] the expectation J J J J dp{ri,r2,vi,V2)g{ri,r2,vi,V2) dridr2dvidv2 exists 

and is finite, uniformly bounded on Z; 
[C2.1b] uniformly in z €z Z, bandwidths hy^z for the two-dimensional smoother satisfy 

hY,z 0, nz^hhy^^^ 00, rizjihy^'^ < od as n — > 00. 
Define Qpe,z = QpoA'^iM) = -q^-^ J J 0p{ti,t2,vi,V2)g2YAti'^^2,vi,V2)dvidv2 and 



1 

nz.h^Y^:\/^,ENiEN—l 



Y,z ieM,_ 

Tij — ti Tik — t2 



X X! (^p{Tij,Tik,Vi-j,Vik)K2 

l<j^k<Ni 



Lemma 3. Under conditions [A0]~[A3], (i), (ii), (iii), (viii), [CI. lb] with TZi =T and 
TZ2=T, [CI. 2b] with g{-, ■,-,■)= g2Y^zi', ■,■,■) and [C2.1b], we have 

■dpn = sup i-^ = Op(l). 

{z,ti,t2)eZxTxT h+ {^/n^h'Y'z 

Proof. This is analogous to the proof of Lemma 1. □ 

[C4] Uniformly in z € Z, hx^z 0, n^^hh^ ^ "^z^hh^ z < ^Y,z -> 0, Uz^hhy^^ -> 

00 and Uz.hhy z < 00 as n — > 00. 

The proof of the next result is omitted. 
Lemma 4. Under conditions [A0]~[A3], (i)-(iii), (viii), (ix), [C3] and [C4], we have 

\Gx,zisi,S2) ~ Gx.z{si,S2)\ , , , 

sup 7- , „ ^— r- = Op(l), (A. 6) 
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sup '\-^'^f^lz5:,^"fn'^' = 0,(1). (A.7) 

To estimate variance of the measurement errors, as in Yao et al. (2005a), we first 
estimate Gx,z{s,s) + (y\ (resp. Gy.zit^t) + cry) using a local linear smoother based 
on Gx,i,z{Sii,Sii) ioT 1 = 1,2,..., L^, i^Nz,h (resp. GY.i,z{Tij,Tij) ior j = 1,2, ... ,Ni, 
i G J^z,h) with smoothing bandwidth bx,z,v (resp. bY,z,v) and denote the estimates 
by Vx,z{s) (resp. VV,z(i)), removing the two ends of the interval S (resp. T) to get 
more stable estimates of ax (resp. Uy)- Denote the estimates based on the generic 
bin [z — h/2,z + h/2) by ct^ ^ and ffy^, let |iS| denote the length of S and let 
Si = [inf{s: s £ 5} + |5|/4, supjs : s e 5} - |5|/4]. Then 



<^%. = ^JjVx{s)~GxA'^,s)]ds, 



and analogously for ay ^ ■ Lemmas 2 and 4 imply the convergence of ct^ ^ and ay ^ , as 
stated in Corollary 1. 

[C5] Uniformly in z g Z, bx,z,v 0, n^j.b'^ .^ y oo, n^^hbx^^y < by^y 0, 
nz,hbY z y ~^ oo and riz.hby ^ y < as n — !• oo. 

Corollary 1. Under condition [C5] and t/ie conditions of Lemmas 2 and 4, 

sup \a-x^z - + (V^w!"^x,2,y)~^ + {^/n7J^h\ ..y^) = 0,(1), 

z^Z 

and analogously for ^ ■ 



Proposition 2. Under conditions [A0]-[A3] in Section 2 and (i)-(ix), the final estimates 
and (jy 



of and (jy (2.8) converge in probability to their corresponding true counterparts, that 



IS, 

Proof. The result follows straightforwardly from Corollary 1. □ 

While Lemma 3 implies consistency of the estimator of the variance, we also require 
an extension regarding estimation of the cross-covariance function. Let {0p(s,t, u,v),p = 
1, 2, . . . , be a collection of real functions Bp-.M."^ ^ M. 

[C2.1c] For £>\i^\ + 2 and any pair of h and £2 such that £ = £i+£2, £i>vi + l 
and ^2 > ^^2 + 1, we have, uniformly in z S 2^, bandwidth hi^z and ^,2,2 satisfy 
hi,z 0, hi^z/h2,z 1, nz^hh^i l^^ 00, Uz^hhl^^^ < 00 as n -J> 00. 
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Define g^g^ = Qp^zi^^''-) = gj^^^pT / / fi'pls, i, w, u)gxy,z(s, u, «) dudw and 



X! FN ^ Op{Sij,T.ij,Uij,Vij)K2 



Sij S Tij t 



'-2,2 



Lemma 5. Under conditions [A0]"[A3], (i), (ii), (iii), (viii), [CI. lb] with TZi =S and 
TZ2=T, [CI. 2b] with g{-, ■,-,■)= gxY,z{', ■,■,■) o,nd [C2.1c] (with ii = £2 = ^ and vi = 
V2=0), we have i^pn ^ sup(^^^,^t)^2xSxT\^pn,z{s,t) - Qpg ,{s,t)\/{h+ {^/n^h'^'+'^h'p+'^)-'^) 
Opil). 

Proof. The proof is analogous to tliat of Lemmas 1 and 3. □ 

[C6] Uniformly in z € Z, bandwidtlis hi^z and /i2,z satisfy hi^z 0, hi^z/h2,z ~^ 1, 
nz,hh\ 2 — 00, nz,hh\ ^ < 00 as n — >■ 00. 

Lemma 6 (Convergence of the cross-covariance function between X and Y). 

Under conditions [A0]-[A3], (i), (ii), (iii), (viii), (ix), [C3] and [C6], 

sup |C'xy,z(s,t) - CxY.z{s-,'t)\l {h + {^/n7Ihi.yh2,zy^) = Op(l). 

Proof. The proof is similar to that of Lemma 4. □ 

Consider the real separable Hilbert space Ly(T) ^ Hy (rcsp. L\{S) = Hx) endowed 
with inner product {f,g)HY = /r/(0.9(^)dt (resp. {f,g)Hx = Is fi^)sis) ds) and norm 
ll/lli/x = V{fJ)H. (rcsp. Il/llff, = ^/Ulh^) (Courant and Hilbert (1953)). Let T^^, 
(resp. T'^ ^) be the set of indices of the eigenfunctions (pz^ki^) (resp. il^z,7n{s)) correspond- 
ing to cigenvRluGS Xz.k (resp. Pz,rn 

) of multiplicity one. We obtain the consistency of Xz.k 
(resp. pz,m) for Xz,k (resp. Pz,m), the consistency of 4>z,k{t) (resp. ipz,mis)) for 4>z,k{t) 
(resp. ipz,ni{s)) in the L^- (resp. Lj^-) norm || • [[//^ (resp. || • \\hy) when Xz,k (resp. p^,™) 
is of multiplicity one, and the uniform consistency of 4iz.k{t) (rcsp. ipz,m{s)) for (j)z,k{t) 
(resp. ipz,mis)) as well. 

For /, g, h £ Hy , define the rank one operator f ® g:h^ {f,h)g. Denote the separa- 
ble Hilbert space of Hilbert-Schmidt operators on Hy by Fy = (T2{F[y), endowed with 
{Ti,T2)fy = tr{TiT*) = Ej{TiUj,T2Uj)hy and ||r|||,^ = (T, T) , where Ti, T2, T e Fy, 
T2 is the adjoint of T2 and {uj: j > 1} is any complete orthonormal system in Hy. The 
covariance operator Gy^z (resp. Gy^z) is generated by the kernel Gy^z (resp. Gy^z)^ 
that is, Gy,,(/) -/^Gy,3(ti,t)/(tijdti (resp. Gy.(/) = /^Gy3(ti,tj/(ti) dti). Ob- 
viously, Gy^z and Gy^ are Hilbert-Schmidt operators. As a result of (A. 7), we have 
sup.g^ ||Gy,, - Gy,z\\FY/{h + {./nTTrhl^^) = Op{l). 
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Let Ty.z.i = {j- ^z,j ~ A^.i} and ly , = \i: \TY.z.i\ = l}j where \'LY,z,i\ denotes the num- 
ber of elements in ly.z.i- Define P^^- = Y.keiY...j '^^•'^ ^^^'^ = Efceiy.,,^ 4>z,k 

to be the true and estimated orthogonal projection operators from Hy to the 
subspace spanned by {02, k S ly.zj}- Set S^j = ^mm{\Xz.i ~ Xz,j\- I ^ ^Y,z,j} and 
A^y ~ {c G C: \c — Xz.jl = '^rjli where C stands for the complex numbers. Let Ry.z 
(resp. Ry,z) be the resolvent of Gy,z (resp. Gy,z), that is, Ry,z(c) = (Gy^^ — c/)^^ 
(rcsp. Ry.z{c) = {Gy,z - cl)-'^). Let . = sup{||Ry,2(c)||j-y : ceAgv } and 

ax - (<5^,(A,x (^/Tlll/il.J"')"' (A.8) 
Parallel notation is assumed for the Y process. 

Proposition 3. Under conditions [A0]-[A3] in Section 2 and conditions (i)-(iii), (viii), 
(ix), [C3], [C4] and [C6], it holds that 

(A.9) 

mel'x,z, (A.IO) 

mel'x,z, (A.ll) 

(A.12) 

kel{^,,, (A. 13) 

kel{.^,, (A. 14) 

\^z,mk - Oz,mk\ = Op(max(aA',ay-, h + (^/nZh/ii, 2/12,2)"^)), (A.15) 

where the norms on Hx and Hy are defined on page 29, both ax , cty o,re defined in 
(A.8) and converge to zero as n—> 00 and the above Op terms are uniform in z€ Z. 

Proof. The proof is similar to the proof of Theorem 2 in Yao et al. (2005a). The uni- 
formity result follows from that of Lemmas 4 and 6. □ 

Note that 

/3(z,s,t) = f;g %^^2.™(.)02,.(O- (A.16) 

k=lm=l 

To define the convergence of the right-hand side of (A.16), in the L2 sense, in (s,t) and 
uniformly in z, we require that [A4] YlV=i Sm=i ^l.mkl Pi,m < ^ uniformly for z€ Z. 



\Pz,m - Pz,m \ = Op(Q!x), 
||V'2,m - "(Az.mllHx = Op(Qx), 
SUp|-02,™(s) - i^z,ra{s)\ = Op(ax), 

\Xz,k - ^z,k\ = Op(aY), 
\\4>zM - (i}z,k\\HY = Op(ai'): 

sup \(t)z,k{t) - 4>z,k{t)\ = Op(aY), 
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The proof of the following result is straightforward. 

Lemma 7. Under condition [A4], uniformly in z & Z, the right-hand side of (A. 16) 
converges in the L2 sense. 

The next result is stated without proof and requires assumptions [A4] and the following: 

A51 > — ; — ^—r^ — r^TT — 1 : ^ uniformly in z G Z, 

MK{h + {^WZKhi,,h2,,y^) 0. 
Lemma 8. Under conditions of Proposition 3, [A4] and [A5], 

hmsup/ M)]^ = » p.o«,. (A.17) 

Proof of Theorem 1. We consider only the convergence of /3{z,s,t). The consistency 
of fi.x,z{s) and fiY,zit) is analogous. First, note that 

{f3{z,s,t)- I3{z,s,t)fdsdt 



TJS 



p 

<2{2b/h+l)y2^02iz^''\z,bf [ [ 0{z^P\s,t)- /3{z^P\s,t)fdsdt (A.18) 

+ 2j^jj^ujo^2{z^P\z,b)P{z^P\s,t)-p{z,s,t)^ dsdt, 

where the 2h/h + 1 in the last inequality is due to the fact that the kernel function 
K{-) is of bounded support [-1,1]. Let a{k) = 'Zp^i^biz^P^ - z){z^p'> - z)^ b{k) = 
Kb{z^P^ ~ z)2(z(p) -zf,Hk = J K{u)u^ du and i^fc = / {K{u)Yu^ du. We then have 

a{k)^^ik\-{l + o{l)) and 5(fc) = z.fe^(l + o(l)) 
h h 

for small h (large P oc 1/h) and small b. Moreover, the usual boundary techniques can 
be applied near the two end points. Consequently,we have 

p 

^wo,2(z(P\z,&f =e?;2(C^WC)-i(C^WWC)(C^WC)-iei.2 
p=i 
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_ T faiO) a(l)y75(0) 6(l)Wa(0) a(l)y' 
-""'''[ail) a{2)J [b{l) bi2)j[ail) a{2) J ^^'^ 



,, ,(1 + 0(1)). 

Due to the compactness of Z, the above o-term is uniform in z Z. This imphes that 
/ ± .o.(z'^) , 6)- dz ^ f ^^-" - ^^^^^ V ^'"^ )(tW^ + "(D) (A-19) 

for small h and 6, where \Z\ denotes the Lebcsgue measure of Z. Hence, (A. 19) and the 
consistency of I3{z,s,t) in the L2 sense in {s,t) and uniformly in z due to (A. 17) imply 
that 



r p 

j Y.uoM^P\z,hf ( f mz^P\s,t)-l3{z^P\s,t))) 

p—1 JTJS 



^dsdt 



dz40. (A.20) 



For the second part in (A. 18), applying a Taylor expansion of (3{z'^p\ s,t) at each z, we 
have 



p 



Y,U^.2{z^P\z,h)p{z^P\s,t) 



p=l 



•^1,2 



a{\) a{2)) l^a(l)j^^^'''^^ + *'i'2(^a(l) a{2) ) \a{2)) dz'^^'''''^' 



a(2)) (a(3))^'^("'^'')+ higher order terms 



1^2M2_lMlM3_^ 

2 fiom~ IJ'i dz'^' 



- /3(z, s,i) + -6^^ 2 T~2''^(-^' '''^) + higher order terms. 



Hence, wo,2(^(p', ^, 6)/3(z(^'), s, t) - /3(z, s, t) = lb^d^^£,l3iz, s, t)(l + o(l)) and 

J J J (^^iOo.2iz^P\z,b)P{z'-P\s,t)- P{z,s,t)^ dsdtdz 



(A.21) 

1^2 Mi -MlM3 / f f f 



_j^2t^2_j^^ I I / £_;3(,,,,i)dsdtdz)(l + o(l))^0. 



Combining (A.20) and (A.21), and further noting condition (xi), completes the proof. □ 
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Proof of Theorem 2. Note that 

Y* [t] - Y*{t) = ^1Y.Z' (t) - tiY,Z' it) + / s, t) - P{Z*,s, t)){X*{.s) - fix,Z' (s)) ds 

Js 

- / P{Z*,s,t){^x,Z'is)- iJ.x,z*{s))ds. 
Js 

The convergence results in Theorem limply that /^(r*(t) dt 4 0, as desired. □ 
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